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BACKGROUND 


Competitive  computer  system  selection  requires  a tool  for 
minimum  performance  measurement.  The  selection  process 
must  be  fair  and,  ideally,  brief  and  economical.  Thus,  the 
measurement  tool  must  be  visibly  fair  and  impartial  in  its 
measurement  of  a computer  system,  it  must  relate  what  is 
being  measured  to  user  needs,  and  it  must  be  economical  to 
apply . The  thrust  of  several  ongoing  “standard  benchmark” 
efforts  in  the  Department  of  Defense  and  other  Federal 
Government  agencies  is  to  develop  a measurement  tool  with 
these  qualities. 

There  are  several  characteristics  of  computer  systems 
which  can  be  measured  for  the  purpose  of  selection: 

(at  Availability  of  equipment  and  software,  in  terms  of 
reliability,  maintenance  time,  and  the  like. 

(b)  Work  capacity,  which  can  be  measured  from  a variety 
of  viewpoints.  Job  time  is  a single-job  measure  and,  therefore, 
not  often  used.  System  throughput  is  a measure  of  how  much 
work  is  done,  and  is  a function  of  the  job  mix  and  job  load, 
as  well  as  various  system  parameters.  Response  time  is  a 
measure  of  the  quality  of  service  rendered,  and  is  largely 
dependent  on  operating  system  and  hardware  characteristics. 

(c)  Functional  capabilities  are  susceptible  to  qualitative 
judgments,  but  demonstrations  of  these  capabilities  are  often 
required  of  computer  system  vendors  (e.g.,  a demonstration 
of  an  on-line  text  editor). 


In  the  context  of  computer  selection,  we  have  felt  it  pru- 
dent to  limit  the  scope  of  our  efforts  to  measuring  through- 
put capacity,  recognising,  however,  that  the  other  factors 
may  take  on  paramount  importance  under  varying  circum- 
stances. 

Relation  to  performance  evaluation 

It  is  important  that  we  recognize  the  affinity  of  any  bench- 
mark study  to  the  subject  of  computer  performance  evalua- 
tion, since  sonic  combination  of  evaluation  techniques  will 
of  necessity  be  used  in  the  development  of  “standard  bench- 
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marks,”  These  techniques  can  be  broadly  classified  and 
characterized  as  follows:1 

(a)  Task-oriented  techniques  concern  themselves  with  sys- 
tem throughput  capabilities  with  respect  to  a given  work- 
load. Simple  instruction  timings  reduce  the  “workload”  to 
specific  classes  of  instructions  (add  time,  floating-point  multi- 
ply, etc.).  Instruction  mixes  consist  of  “representative”  sam- 
ples of  instruction  sets  designed  to  reflect  the  degree  to  which 
each  instruction  class  is  used  for  a given  type  of  application. 
These  are  adequate  for  estimating  processor  power,  but  com- 
pletely ignore  memory,  degree  of  multiprogramming.  1,0 
loads,  etc.  Kernels  are  relatively  small  sequences  of  code 
performing  a single  (simple)  function  (e  g.,  a tabic  search), 
and,  again,  are  designed  primarily  for  measuring  processing 
power.  The  timings  for  kernels  may  be  obtained  hy  actually 
executing  them  or  by  hand-calculations.  Benchmarks  consist 
of  a subset  of  a given  workload  ("natural”  benchmarks!,  a 
subset  which  has  been  further  modified  (“hybrid”  bench- 
marks), or  a set  of  programs  written  specifically  for  the  pur- 
pose of  making  a comparative  evaluation  (“synthetic”  pro- 
grams). Benchmarks  are  processed  on  the  configurations 
being  evaluated  or  compared,  and  the  processing  time  is 
used  as  a relative  figure  of  merit. 

(b)  The  emphasis  in  component-onented  evaluation  tech- 
niques is  on  the  system  being  evaluated  rather  than  on  the 
workload  to  be  processed  by  this  system.  Hardware  monitors 
are  relatively  inexpensive,  precise  in  what  they  measure, 
non-disruptive,  but  insensitive  to  data-dependent  informa- 
tion. The  characteristics  of  software  monitors  are  almost 
the  precise  opposite  of  those  for  hardware  monitors.  The 
convenience  of  queueing  models  is  offset  by  their  inaccuracy 
and  shallowness.  Stochastic  models  (simulation  models!  are 
less  imprecise  but  costly,  and  suffer  from  a credibility  gap. 


Problems  with  natural  or  hybrid  benchmarks 

Benchmarks  have  for  some  period  of  time  constituted  the 
accepted  form  of  minimum  performance  measurement  in 
computer  selection  tliroughout  the  Federal  marketplace  Nat- 
ural or  hybrid  benchmarks  have  the  advantages  of  dealing 
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Figure  1— Example  of  VP- Routine  input,  population  file  form  of  audit 
routines,  and  compilation-time  form  of  audit  routines 


with  a real  system  (thus  avoiding  half  of  the  simulation 
credibility  problem)  and  a “semi-real”  job  mix.  Among  the 
more  serious  problems  associated  with  benchmarks  are  the 
following: 

(a)  It  is  extremely  difficult,  except  in  the  simplest  situa- 
tions, to  construct  a set  of  benchmark  programs  which 
accurately  reflects  a given  job  mix.  This  of  course  is  a prob- 
lem common  to  any  performance  measurement  technique, 
since  the  nature  of  “a  given  job  mix”  is  dependent  on  a 
multitude  of  parameters,  many  of  which  are  system  de- 
pendent (e  g.,  EXc cute  Channel  Program  instruction  counts 
are  often  used  to  measure  I/O  time  on  IBM  S/360  or  S/370 
systems,  but  these  instructions  have  little  meaning  outside 
the  S/360-370  series,  and  often  have  no  precise  counterparts 
on  other  systems)  and  most  of  which  are  time  dependent. 

(b)  They  are  generally  non-portable  (system  dependent) 
ar.d  often  do  not  run  correctly,  even  on  their  native  system. 

(c)  They  are  prepared  and  processed  using  a variety  of 
procedures  resulting  in  unduly  long  execution  times,  un- 
reasonable file  volumes,  and  inconsistent  measurement  pro- 
cedures. This  author  has  seen  benchmarks  for  which  the 
required  processing  time  was  better  than  three  hours,  and 
th  file  population  resided  on  two  dozen  (full)  tape  reels! 
In  some  cases  only  processor  time  is  measured;  in  others,  all 
components  (including,  e.g.,  printers)  must  halt  before  timing 
stops. 

(d)  The  above  problems  result  in  extremely  high  costs,  to 
buyers  and  vendors,  in  terms  of  both  time  and  money.  It  is 
not  unusual  for  a vendor  to  spend  6-9  calendar  months  just 
to  prepare  the  submitted  benchmarks  for  processing,  or  for 
the  cost  of  processing  them  to  be  10  percent  or  more  of  the. 
eventual  bid  price. 


SCOPE  OF  THE  U.  S.  NAVY  EXPERIMENT 

The  Software  Development  Division  of  the  Department 
of  the  Navy  Automatic  Data  Processing  Equipment  Selection 
Office  (ADPESO)  is  performing  an  experiment  to  determine 
the  suitability  of  synthetic-programs  in  alleviating  the  prob- 
lems created  by  natural  and  hybrid  benchmark? 

The  experiment  began  in  June  1973,  with  the  development 
of  a small  (5  program)  reference  library  of  synthetic  programs. 
We  assumed  that  synthetic  programs  could  be  written  so 
that  relatively  few  parameters  control  their  behavior;  experi- 
mentation could  be  performed  on  these  programs  so  that 
their  behavior  relative  to  changing  parameter  values  would 
be  predictable,  specifications  of  a workload  based  on  Ihe 
parameters  implicitly  defined  by  the  synthetic  programs  could 
be  made,  and  synthetic  program  parameters  could  be  set  so 
as  to  reflect  this  workload. 

The  use  of  synthetic  programs  in  performance  evaluation 
docs  not  represent  a new  concept.  Dopping,2  and  Gosden  and 
Sisson*  reported  on  experiments  in  the  use  of  synthetic  pro- 
grams as  far  back  as  1962.  More  recent  suggestions  on  their 
use  have  come  from  Joslin4  and  Buchholtz  5 Our  aims  have 
been  to  obtain  quantitative  profiles  of  certain  synthetic  pro- 
grams and  to  determine  the  scope  of  their  feasible  utility. 


RELATED  EFFORTS 

There  are  several  complementary  efforts  in  the  Federal 
Government  aimed  at  designing  representative  benchmarks. 

The  U.  S.  Army  Computer  System  Support  and  Evalua- 
tion Command  has  recently  issued  a solicitation  for  a “Stand- 
ard Benchmark  Study.”  The  contract  objectives  are  (a)  The 
definition  of  all  tasks  and  measurable  functions  performed 
by  a computer  in  executing  business-type  applications;  (b) 
Development  of  a method  or  technique  of  identifying  and 
measuring  the  occurrence  of  each  function  or  parameter  in 
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each  task  for  the  purpose  of  profiling  computer  workloads. 
This  solicitation  is  the  result  of  a careful  study  on  the  part 
of  a Department  of  Defense  Joint  Steering  C<  mmittee  which 
has,  among  other  things,  defined  a preliminary  set  of  applica- 
tion tasks  and  task  parameters  for  benchmark  purposes. 

The  Department  of  Agriculture  has  constructed  a com- 
prehensive set  of  benchmark  programs  which  include  trans- 
action processing  and  data  base  management  applications. 
There  is  much  in  this  package  which  should  be  carefully 
studied  as  part  of  any  effort  at  designing  a library  of  standard 
benchmark  programs. 

The  Department  of  Labor  is  developing  a job  selection 
simulation  model*  using  actual  utilization  statistics  as  control 
parameters.  Although  the  goals  here  are  somewhat  different 
from  those  of  the  “standard  benchmark  effort”  there  may 
be  some  related  spinoff  benefits. 

A similar  project  Ls  being  carred  on  by  Marine  Corps  using 
hardware  monitors  to  provide  data  for  the  synthetic  creation 
of  jobs.’ 
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RESULTS 
The  programs 

Five  processing  tasks  were  selected  as  representing,  in 
varying  combinations,  a broad  variety  of  application  tasks. 
These  were  sequential  file  processing,  indexed  sequential  file 
processing,  relative  I/O  processing,  sorting,  and  computation. 

Programs  were  written  to  perform  each  of  these  tasks. 
Because  most  of  the  Navy’s  present  benchmark  needs  relate 
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to  COBOL-oriented  workloads,  all  of  the  reference  library 
programs  are  written  in  American  National  Standard  CO- 
BOL Additionally,  all  the  programs  are  in  "system  inde- 
pendent” form.  This  is  accomplished  through  the  use  of  an 
executive  program,  the  VP- Routine.  The  VP-Routine  was 
developed  in  1969  by  the  Department  of  the  Navy  as  part 
of  its  COBOL  Compiler  Validation  System  * It  is  used  to 
resolve  implementor  names  (e  g.,  in  the  ENVIRONMF.NT 
DIVISION),  modify  compilMime  parameters  (e  g , record 
sizes,  precision,  blocking  factors),  arid  automatically  generate 
job  control  instructions  appropriate  to  the  system  we  are 
executing  under  (Figure  1). 

Each  program  is  controlled  hv  a set  of  compile  time  and 
execution  time  parameters.  Figures  2-6  identify  these  for  "ach 
of  the  five  programs.  The  ability  to  vary  automatically  cer- 
tain parameters  at  compile  time  provides  us  with  the  flexibil- 
ity to  develop  a fairly  rich  mix  from  just  a few  basic  programs. 

We  have  adopted  certain  design  principles  which,  while 
applicable  to  software  design  in  general,  we  felt  were  par- 
ticularly important  to  this  nroject. 

(a)  We  have  attempted  to  make  every  detail  of  the  struc- 
ture of  each  program  visible  and  understandable  to  u prospec- 
tive user.  This  is  a prerequisite  to  a “sellable”  product. 

(b)  The  design  of  each  program  is  consistent  with  that 
of  tho  others.  We  have  used  “modular  programming” 
throughout,  although  frankly,  this  was  simply  a reflection 
of  following  long  accepted  standards  of  good  programming 
practice.  We  maintained  consistency  in  the  binding  time  of 
parameters  across  programs.  Thus,  if  a given  parameter  is 
bound  at  compile  time  in  one  program  it  is  bound  at  compile 
time  in  all  the  programs.  Also,  all  files  used  by  a program  are 
generated  by  that  program  (eventually,  the  file  generation 
modules  may  bo  combined  into  one  program) 

(c)  We  have  isolated  the  function  of  each  of  the  program 
parameters  so  as  to  render  each  parameter  independent  of 
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Figure  3 — ISAM  module  parameters 
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Figure  5 — SORT  module  parameters 


thr  others.  This  was  necessary  to  avoid  facing  an  exponen- 
tially rising  wit  of  options  in  setting  parameters  to  control 
program  behavior.  This  was  a difficult  principle,  to  follow 
since,  for  example,  a simple  specification  such  as  how  one  is 
to  control  I/O  time  can  be  made  in  terms  of  file  size,  blocking 
factor,  logical  record  size,  etc.  In  this  case  wc  could  choose 
to  us«'  fib'  size  to  effect  time,  blocking  factor  to  impact  buffer- 
ing, and  maintain  logical  record  size  constant. 

(d)  Only  those  functions  which  were  felt  essential  to  the 
accurate  modeling  of  a task  were  included  in  each  program. 
Thus  we  opted  for  a clearly  defined  Bcope  and  simplicity 
rather  than  complexity.  We  feel  this  was  particularly  im- 
portant in  the  selection  of  synthetic  program  functions  and 
parameters,  since  a laek  of  frugality  can  lead  to  a level  of 
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complexity  in  the  programs  which  would  have  renders!  them 
completely  unamenable  to  analysis. 

(e)  The  design  of  each  program  (and  of  the  set  of  programs 
as  a whole)  lends  it. elf  lo  extension,  so  that  a wide  range  of 
task  characteristics  can  be  accommodated. 

Each  program  is  self-documented.  A “prologue”  is  in- 
cluded for  each  and  commenting  is  plentiful,  though  perti- 
nent. External  documentation  consists  of  a “module  over- 
view” (see  Figure  7),  parameter  specifications,  experimental 
results,  and  a User  Guide  to  assist  an  organization  in  imple- 
menting the  pTogTams  ami  using  the  VP -Routine.  We  have 
avoided  lengthy  descriptions  and  detailed  flowcharts  because 
we  question  their  usefulness. 
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An  Experiment  in  the  Use  of  Synthetic  Programs  for  System  Benchmarking  435 


♦ 

♦ 

♦ 

♦ 

♦ 

♦ 

♦ 

♦ 

< > 

1000  5000 


15 


Maaory 

Saconda 


No.  Maatar  Tlla  Kacorda 
(Detail  flK  Slae  la  10) 


Maaory 

Saconda 


< ) 

< ► 

< 

<► 

0 i- 

1000 


O 

♦ 

O 

♦ 


o 

♦ 

o 

♦ 


o 

o 

♦ 

♦ o 

o 

♦ 

♦ 


o 

♦ 

o 


o 

♦ 

o 


♦ 


♦ 


♦ 


♦ 


No.  Master  rile  Baocrd* 

t Detail  File  files  Is  10) 


Figure  8 — Sequential  file  update  time  a j a function  of  master  file  sixe — 
no  CPU  '“activity,  drum-resident  files 


Figure  9— Sequential  file  update  time  as  a function  of  master  file  size 
no  CPU  activity,  tape-resident  files 


The  programs,  documentation,  »nd  VP-Routine  are  col- 
lected on  a 2400  foot  magnetic  tape  reel.  The  User  Guide 
and  experimental  results  on  program  behavior  are  separately 
bound.  The  entire  package  is  in  the  public  domain. 

Examples  of  processing  results 

A complete  summary  of  processing  results  is  beyond  the 
scope  of  this  paper,  but  we  can  discuss  some  of  the  more 
interesting  of  those  results.  All  results  mentioned  are  based 
on  executions  on  a UNIVAC  1108  Unit  Processor,  under 
control  of  the  EXEC-8  Operating  System. 

The  “sequential  I/O”  module  is  the  simplest  of  the  file 
processing  programs.  Its  function  is  to  pass  a master  file 
against  a detail  file,  creating  ft  new  master  file.  The  files 
may  reside  on  tape  or  direct  access  devices.  A compute  loop 
may  be  performed  a variable  number  of  times  each  time  a 
master  file  record  is  updated.  The  processing  includes  a 
table  search,  and  the  size  of  the  table  is  used  to  control 
memory  requirements.  All  computations  are  self-checking. 
The  program  is  similar  in  tiles'-  and  other  characteristics  to 
the  PL/1  program  described  by  Buchholz.* 

Predictably,  we  found  I/O  time  to  be  a linear  function  of 
master  file  size.  This  was  true  for  FASTRAND  (drum) 
resident  as  well  as  tape  resident  files.  Repeated  runs  during 
different  times  of  day  showed  that  the  curve  reflecting  the 
behavior  of  time  as  a function  of  master  file  size  remained  a 
straight  line  with  constant  slope,  although  the  intercept 
value  changed  (Figure  8).  In  all  these  runs,  only  the  master 
file  size  was  varied  (from  100  to  5000  records),  with  the  detail 
file  size  fixed  at  10  records),  and  only  one  pass  through  the 
compute  loop  was  performed. 

We  processed  a series  of  similar  runs  with  all  files  residing 
on  UNIVAC  8-0  tapes.  Again,  running  the  program  in  a 
mix  did  not  change  the  linear  behavior  of  time  as  a function 
of  file  size  (Figure  9).  As  before,  the  detail  file  size  was  held 
constant,  and  only  one  pass  through  the  compute  loop  was 


performed  on  each  record  update.  Thus,  while  other  programs 
in  a mix  clearly  affect  the  quantitative  behavior  of  a sequential 
update  task,  they  appear  to  have  almost  no  effoet  on  its 
qualitative  behavior. 

CPU  time  turned  out  to  be  a linear  function  of  the  number 
of  repetitions  through  the  compute  loop. 

Execution  of  the  “compute”  module  produced  some  inter- 
esting results.  The  program  generates  a variable-sized  table 
of  uniformly  distributed  pseudo-random  numbers,  performs 
a “runs-up-and-down”  test  on  them,  and  optionally  pro- 
duces printer  output.  A parameter  controlling  the  number 
of  processing  iterations  is  used  to  vary  the  amount  of  CPU 
activity. 


un 


1 titration*  in  coaputc  loop 

Figure  10 — Compute  module  CPU  utilization  as  a function  of  number 
of  iteration*  in  the  computation  loop 
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Figure  11-  Compute  module  CPU  time  utilization  as  a function  of 
number  of  iterations  m compute  loop 

When  the  number  of  iterations  reached  a certain  threshold 
(usually  500)  the  CPU  time  varied  linearly  with  this  param- 
eter. Below  that  point,  however,  we  noticed  some  fluctuations 
(Figure  10).  We  believe  this  is  due  to  the  way  the  EXEC-8 
dispatcher  schedules  jobs  for  CPU  time.  (It  uses  a variation 
of  Corbato's  time  quantum  charging  algorithm.’) 

Figure  1 1 summarizes  two  executions,  run  under  identical 
conditions.  The  only  difference  was  that  in  one  the  usage  of 
variables  was  "computational,”  in  the  other  “display.’’  As  a 
program  becomes  CPU  bound  an  exorbitant  price  is  paid 
for  the  "machine  independency"  of  data. 

F’igure  12  shows  the  relationship  between  memory  time 
(for  a given  program,  a memory  second  is  defined  as  the 
occupation  of  32K  words  of  memory  for  a period  of  one 
second,  during  which  time  the  program  is  undergoing  either 
CPU  or  I/O  activity)  and  the  size  of  the  file  being  sorted 
for  the  "sort"  module.  Again,  we  found  a linear  behavior, 
and  this  pattern  was  consistent  regardless  of  other  jobs  in 
the  mix,  time  of  day,  etc.  Fluctuations  at  the  low  end  of  the 
line  were  due,  as  in  other  cases,  to  ICXEC-8  allocation 
characteristics. 

Problems  encountered 

We  feel  confident,  based  on  our  tests  thus  far,  lhat  we  can 
indeed  modify  program  parameters,  for  the  modules  we  have 
produced,  it:  such  a way  that  we  can  force  a predictable 
behavior  on  the  programs,  in  terms  of  both  time  and  pattern. 
This,  however,  only  tells  us  that  we  can  control  the  programs 
- a necessary  but  not  sufficient  condition  if  we  art  f<  create 
synthetic  benchmarks. 

We  have  also  encountered  certain  difficulties  with  the 
synthetic  program  approach.  Not  all  of  these  arc  unique  to 
this  approach,  but  this  offers  us  little  solace.  The  following 
were  I he  most  serious  of  these  problems: 

(ai  Because  synthetic  programs  tend  to  be  stylized,  they 
may  produce  surprising  results.  For  example,  an  opti- 
mizing compiler  can  have  a much  greater  impact,  on  a 


synthetic  benchmark  than  on  a natural  one.  Yet, 
user  workloads  are  “natural,”  not  synthetic.  We  have 
found  that  PERFORM  sections  which  are  called  only 
once,  and  not  otherwise  entered,  are  placed  in-line  by 
many  compilers,  but  not  by  all.  This  cre  ates  no  diffi- 
culties if  a user  creating  a set  of  benchmarks  knows 
what  his  compiler  does,  but  lie  does  not  have  to  know 
Also,  sequences  if  code  such  as 

/ = /+! 

A - I, 

where  / is  a loop-control  parameter  (the  syntax  here 
is  FORTRAN  but  the  principle  is  equally  true  or 
COBOL)  nr  generally  not  performed  as  such  by  an 
even  moderately  intelligent  compiler. 

(b)  Another  problem  we  have  encountered  is  that  over- 
whelming side  effects  can  occur  in  overly  parameterized 
synthetic  programs.  For  example,  the  COBOL  PER- 
FORM verb  translates  to  14  instructions  on  one 
system  we  executed  under  while  the  MOVE  verb 
translates  to  1 instruction.  Thus,  using  the  PER- 
FORM instruction  to  varv  the  number  of  times  a 
MOVE  instruction  is  executed  leads  to  grossly  mis- 
leading results  when  the  PERFORM  itself  is  tin 
object  of  yet  another  PERFORM. 

(c)  One  needs  te  understand  the  "native”  system  in  some 
detail  in  order  to  develop  benchmarks  purporting  to 
accurately  reflect  a given  workload  for  that  system 
Some  of  the  test  results  cited  above,  for  example, 
were  clearly  due  to  the  nature  of  the  system  on  which 
the  programs  were  executed  This  means  that  guide- 
lines on  how  to  use  the  synthetic  modules  will  differ 
with  differing  systems.  Also,  it  is  easy  to  create  an 
unduly  complex  program  (in  terms  of  possible  combi- 
nations of  parameters)  if  the  architecture  of  the  native 
system  is  not  understood.  Repeating,  for  instance,  a 
series  of  COBOL  MOVE'*,  varying  field  sizes  each 
time,  accomplishes  nothing  more  than  what  could  be 
accomplished  by  moving  a fixed  i/e  variable  on  IBM 
8/360  computers,  since  a singh  machine  instruction, 
MVC  (move  character)  is  used  regard  l<  - of  field 
size.  Yet,  on  a UNIVAO  1 108,  changes  in  object  code 
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do  occur  at  certain  field  sizes.  Also,  moves  of  literals, 
numerics,  and  character  fields  are  usually  all  per- 
formed in  the  same  way,  so  that  incorporating  all  of 
these  in  a program  is  simply  adding  to  the  combi- 
nations of  parameters  without  really  contributing  to 
the  value  of  the  program. 

(d)  We  see  no  evidence  of  a satisfactory  way  of  modeling 
a workload.  Even  a simple  I/O — CPU  analysis  of  a 
file  maintenance  problem  depends  on  a multitude  of 
parameters:  proportion  of  active  to  passive  records, 
distribution  and  location  of  active  records  in  the  master 
file,  number  of  instructions  executed  per  active/in- 
active record,  record  size,  frequencies  with  which  in- 
structions are  executed,  etc.  This  difficulty  Ls  seriously 
aggravated  in  a mix  of  programs.  It  is  not  at  all  clear 
that  techniques  for  matching  job  parameters  to  mix 
parameters  is  feasible.  The  use  of  analytical  models  to 
characterize  a job  mix  and  thereby  provide  inputs  to 
the  synthetic  programs1  is  clearly  unsatisfactory,  since 
the  limiting  factor  would  then  become  the  analytical 
techniques  themselves.  This  class  of  techniques  is 
already  regarded  as  grossly  imprecise. 

The  use  of  software  monitors  for  data  collection  is  likewise 
unacceptable  since  they  create  serious  instances  of  the 
"Hawthorne”  effect.10  This  could  possibly  be  compensated 
for,  but  with  considerable  difficulty. 

In  fart,  it  is  important  to  note  that  all  suggestions  on 
how  to  model  a workload  rely  on  one  of  the  evaluation 
techniques  previously  surveyed  (monitors,  simulation,  etc.). 
Thus,  we  should  not  expect  the  synthetic  mix  approach  to 
be  an  improvement  over  these. 

The  problem  of  “representativeness”  which  exists  in 
natural  benchmarks  will  simply  not  disappear  just  because 
we  use  synthetic  programs.  We  have  cited  the  system  de- 
pendency of  workload  parameters  (particularly  as  they  apply 
to  I/O  time)  and  the  sheer  magnitude  of  the  number  of 
combinations  of  program  parameter  values.  An  equally 
crucial  problem  is  the  fact  that  the  nature  of  a workload  is 
time  dependent.  Any  attempt  to  condense  a workload  into  a, 
say,  two-hour  benchmark  is  bound  to  result  in  substantial 
homogenization,  and  some  important  characteristics  could 
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Figure  13  Monthly  utilization  profile  (Source:  Annual  Report,  Uni- 
versity of  North  Carolina  Computation  Center,  1970) 


Figure  14 — Daily  utilization  profile  (Source:  Annual  Report,  University 
of  North  Carolina  Computation  Center,  1970) 

be  lost.  As  a simple  example,  the  annual  workload  of  a com- 
puter center,  in  terms  of  productive  hours,  is  given  in  Figure 
13.  It  suggests  that  there  is  plenty  of  excess  capacity.  Yet 
the  workload  on  a typical  mid-week  day  shown  in  I'igure  14 
indicates  that  for  this  period  the  system  was  saturated.  We 
know  of  no  satisfactory  techniques  which  allow  us  to  model 
this  behavior  for  the  purpose  of  building  benchmarks. 

CONCLUSIONS 

Con  a controllable  job  mix  be  constructed 1 

We  believe,  on  the  basis  of  our  experience  thus  far,  that 
task-oriented  synthetic  programs  can  be  combin<>d  into  a mix 
which  can  be  controlled  to  exhibit  desired  processing  time, 
memory,  I/O  time,  and  I/O  devices  utilization  character- 
istics. There  have  been  other  efforts  that  bear  this  out.11 
We  plan  additional  testing  on  a variety  of  systems  so  as  to 
learn  more  about  some  of  the  system  dependencies  we  have 
encountered. 

Can  a workload  he  profiled 1 

We  do  not  believe  that  it  is  possible  to  arrive  at  a gener- 
alised, comprehensive,  and  accurate  model  of  system  work- 
loads except  in  the  most  trivial  cases.  We  can  certainly 
retrofit  That  is,  we  can  accept  a workload  definition  based 
on  the  synthetic  program  parameters.  We  also  believe  that 
this  need  not  impede  the  use  of  synthetic  programs  in  bench- 
marks. In  this,  we  strongly  support  the  view  expressed  by 
J.  C.  Strauss.  In  a recent  paper1*  on  the  use  of  natural  bench- 
marks, he  stated  that , based  in  part  on  prior  experience  and 
on  the  difficulties  encountered,  “it  was  felt  more  important 
that  the  behavior  of  the  benchmarks  be  well  understood  and 
cover  a broad  range  of  important  system  features  than  ihat 
the  complete  benchmark  series  be  representative  of  the 
general  workload.” 
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Other  usee  for  synthetic  programs 

Isolated  system  characteristics  can  be  exorcised  using  syn- 
thetic programs.  We  have  in  fact  used  the  I/O  modules  in 
our  reference  set  to  test  various  operating  systems  data 
management  capabilities.  Synthetic  programs  also  serve  as 
convenient  tools  to  determine  the  impact  of  certain  pro- 
gramming practices,  as  was  doin'  in  using  the  “compute” 
module  to  measure  the  degradation,  on  a specific  system, 
resulting  from  COBOL  DISPLAY  mode  computation. 

.4  recommendation 

We  feel  our  testing  has  substantiated  our  original  as- 
sumptions. A small  number  of  simple,  task-oriented,  syn- 
thetic programs  can  be  combined  into  a fairly  rich  and 
versatile  job  mix.  A relatively  small  number  of  parameters  is 
sufficient  to  enable  a single  program  to  reflect  the  character- 
istics of  a broad  class  of  applications.  Also,  individual  modules 
have  proven  useful  in  exercising  isolated  computer  system 
features,  such  as  I/O  handling.  Finally,  if  one  accepts  a 
“modest”  workload  characterization,  aimed  more  at  re- 
flecting extremities  and  crucial  areas  rather  than  compre- 
hensiveness, it  is  possible  and  reasonable  to  construct  a 
benchmark  from  a set  of  synthetic  modules. 

Synthetic  programs  arc  neither  difficult  nor  expensive  to 
produce.  Our  present  set,  admittedly  small,  was  designed, 
coded,  and  debugged  in  two  calendar  months.  An  additional 
three  months  were  required  for  experimentation,  packaging, 
and  system  documentation.  These  times  do  not  consider  the 
VP-Routine,  which  was  already  available.  Total  manpower 
used  for  the  effort  amounted  to  four  man-months.  Total 
cost,  including  machine  time,  clerical  support,  and  salaries 
was  under  $0,000.  Furthermore,  the  system  is  available  to 
anyone  upon  request.  Thus,  we  feel  we  have  made  a small 
investment  for  a product  which  has  already  given  a sub- 
stantial payoff,  in  what  we  have  learned  if  nothing  else. 


A reference  set  of  “controllable”  programs  is  a useful  tool 
for  any  data  processing  installation.  Our  concern  was  pri- 
marily with  benchmarks  for  system  selection.  We  have  indi- 
cated that  performance  measurement  is  a related  area  of 
application.  System  sizing,  throughput  estimates  against  a 
changing  workload,  expected  response  time'  to  a varying 
.stimulus,  and  availability  measurements  are  other  reasonable 
applications  for  a set  of  synthetic  modules.  The  modesty  of 
the  effort  required  to  produce  such  a set  certainly  commends 
further  study. 
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